Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Aug 6, 2025

Related GitHub Issue

Closes: #4852

Roo Code Task Context (Optional)

Issue Fixer Orchestrator workflow completed with comprehensive analysis, implementation, testing, and critical review phases.

Description

This PR fixes the apply_diff tool throwing "Cannot read properties of undefined (reading 'addChild')" errors when parsing large or complex XML files. The error occurs due to a bug/limitation in the fast-xml-parser library when handling deeply nested or complex XML structures.

The solution implements a robust dual-parser system:

  • Primary parser: fast-xml-parser (existing parser)
  • Fallback parser: Regex-based parser that activates when fast-xml-parser fails

This ensures the tool can handle edge cases where fast-xml-parser encounters parsing errors on complex XML structures.

Key Changes

  1. Dual-parser system (src/utils/xml.ts):

    • Implements fallback regex-based parser for when fast-xml-parser fails
    • Handles CDATA sections and complex nested structures
    • Provides better error messages when parsing fails
  2. Enhanced error handling (src/core/tools/multiApplyDiffTool.ts):

    • Adds XML structure validation before parsing
    • Improves error messages and telemetry
    • Handles parser failures gracefully
  3. Comprehensive tests:

    • Tests for parser fallback mechanism
    • Tests for handling complex XML structures
    • Tests for error recovery scenarios

Testing

  • Added unit tests for the fallback parser functionality
  • Added tests for XML validation
  • Tested with the complex XML examples from the original issue

Notes

The fallback parser ensures the tool remains functional even when the primary parser encounters edge cases, providing a more robust solution for handling various XML structures.

…ence (#4852)

- Implement dual-parser system with automatic fallback
- Add XMLParserManager class to encapsulate parser state
- Detect and handle xml2js interference from other extensions
- Add circuit breaker pattern to prevent infinite loops
- Add comprehensive test coverage (22 tests)
- Remove all debug logging for production readiness
Copilot AI review requested due to automatic review settings August 6, 2025 22:44
@hannesrudolph hannesrudolph requested a review from mrubens as a code owner August 6, 2025 22:44
@hannesrudolph hannesrudolph requested review from cte and jr as code owners August 6, 2025 22:44
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. bug Something isn't working labels Aug 6, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical issue where the apply_diff tool would hang indefinitely on large or complex XML files due to interference from external XML parsers (specifically xml2js loaded by other VSCode extensions). The solution implements a robust dual-parser system with automatic fallback and circuit breaker protection.

  • Implements dual-parser system: fast-xml-parser as primary, regex-based fallback for xml2js interference
  • Adds circuit breaker pattern to prevent repeated parser failures
  • Enhances error detection and XML validation with improved diagnostics

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/utils/xml.ts Core implementation of dual-parser system with fallback logic and circuit breaker
src/core/tools/multiApplyDiffTool.ts XML validation and enhanced error handling with telemetry improvements
src/core/tools/tests/multiApplyDiffTool.test.ts Comprehensive test suite covering validation, fallback scenarios, and edge cases
src/core/diff/strategies/tests/issue-4852-extended.spec.ts Extended test suite focusing on xml2js interference detection and fallback parser functionality

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 6, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the changes and the implementation looks solid overall. The dual-parser approach with fallback is a clever solution to handle external XML parser interference. I have some suggestions for improvement that could make the solution even more robust.

- Add comprehensive documentation explaining dual-parser system
- Add size limit check (10MB) for fallback regex parser to prevent memory exhaustion
- Improve error handling for non-Error types thrown by third-party code
- Make path extraction throw explicit error instead of returning null
- Improve tag balance validation to account for self-closing tags
- Simplify error messages to be more user-friendly
- Add thread-safety documentation note for parser instance
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Aug 6, 2025
…MAX_FAILURES logic

- Remove circuit breaker pattern with MAX_FAILURES counter
- Use immediate fallback for any parse error, not just addChild errors
- Simplify error handling logic as retrying the same XML parse multiple times is unnecessary
- Update tests to reflect the simplified error handling approach
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Aug 6, 2025
@hannesrudolph hannesrudolph moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 7, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 7, 2025
@daniel-lxs
Copy link
Member

@hannesrudolph Do you think it's worth adding this just to prevent conflicts with other XML parsers? is there a way to prevent interference from them without having to implement this?

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap Aug 7, 2025
- Updated PR description to clarify fast-xml-parser throws the error, not xml2js
- Removed all xml2js interference references from code and comments
- Updated tests to reflect the actual issue (fast-xml-parser error on complex XML)
- The issue is about fast-xml-parser limitations, not external extension interference
@hannesrudolph
Copy link
Collaborator Author

PR Updated - Corrected Root Cause Description

I've updated this PR to accurately reflect the actual issue. The problem is not about interference from xml2js or other extensions, but rather a limitation/bug in fast-xml-parser itself when handling complex XML structures.

Changes Made:

  1. Updated PR description - Now correctly states that fast-xml-parser throws the "addChild" error on complex XML
  2. Removed all misleading xml2js references from:
    • Code comments in src/utils/xml.ts
    • Error messages in src/core/tools/multiApplyDiffTool.ts
    • Test descriptions and comments in test files
  3. Updated issue Bug: apply_diff errors out on large or complex XML files #4852 title - Changed from "hangs indefinitely" to "errors out" to match the actual behavior

The Real Issue:

  • fast-xml-parser throws Cannot read properties of undefined (reading 'addChild') on certain complex XML structures
  • The fallback regex parser correctly handles these edge cases
  • This is a limitation of fast-xml-parser, not interference from external extensions

@daniel-lxs Your question was spot on - there is no actual conflict with other XML parsers. The solution remains valid (fallback parser for when fast-xml-parser fails), but the attribution to external interference was incorrect.

- Fixed the root cause of fast-xml-parser errors by properly escaping content in XML examples
- Used CDATA sections for diff content that contains special characters
- This prevents the parser from misinterpreting markdown backticks as XML structure
@hannesrudolph
Copy link
Collaborator Author

Alternative Solution Found

After further investigation, I've identified the actual root cause and created a much cleaner fix in PR #6811.

The Real Issue

The problem wasn't fast-xml-parser having a bug or limitation - it was that the XML examples in multi-file-search-replace.ts contained unescaped markdown backticks that confused the parser.

The Proper Fix

PR #6811 simply uses CDATA sections to properly escape the content, which is the standard XML way to handle special characters. This eliminates the need for:

  • Fallback parser implementation
  • Error detection logic
  • Complex workarounds

I recommend closing this PR in favor of #6811, which is a much simpler and more correct solution.

@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 7, 2025
@github-project-automation github-project-automation bot moved this from PR [Changes Requested] to Done in Roo Code Roadmap Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working PR - Changes Requested size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Bug: apply_diff errors out on large or complex XML files

3 participants